Data Assessment

Project Title :- FOOD AND DRUG RECALL SYSTEM ANALYSIS

Group- 5

Introduction

A product recall is referred to as a request to return, exchange, or replace a product if a manufacturer or consumer protection organisation finds flaws that might impair performance, endanger customers, or result in legal problems for the manufacturers. Products that are dangerous or have manufacturing flaws are frequently put on store shelves and sold to customers. A product recall is a process used to remove unsafe or flawed products from customer hands. These recalls may occasionally result in claims of product responsibility.

We created a clear Analytical Problem of our project, which is ‘To identify and analyze the reasons for product recalls in the food, medical devices, and other consumer goods industries and evaluate the effectiveness of the recall processes in ensuring public safety. The goal is to identify patterns and trends in recall data to inform regulatory and industry efforts to prevent future recalls and improve the overall safety of consumer products.’ The next step of our project work which is identify dataset connected to our business problem and check the Quality of data, it’s suitability and usefulness to find the perfect solution. We have had access to many such dataset of past many years, but we want the recent data to predict the future recall, so we examine our data and select the dataset and followed our data assessment.

About the dataset

This dataset is for recall-system prediction from the U.S. Food and drug administration website. This is the latest data which has an information about different product typesuch as Biologics, Devices, Drugs, Food/Cosmetics, Tobacco, Veterinary and their status with the reason of why they are recalled from the specific state. We reviewed many dataset on website likes:

where there was identified that food and drug related dataset is hard to find as there are not any importanat co-rrelation between the variables. Above two links has not enough data about recall and it also does not have any numerical value. Above link has the data of recall system for canada and U.S. We decided to analyze with U.S. food and drug recall data as there are total 17 features in the dataset but in open canada website, they only have a limited amount of data as well as they do not have different product type. Apart from that in open canada website, they only have a past dataset but in FDA, we have all current data as well as different product type so analysis can be made easily. This many problems and old years dataset from which we cant figured out the patterns for our near future trends in recall.

We have total 81697 attributes and 17 features in our dataset. It does not have any missing values. The discription of the feature sare listed below:

1) FEI number: - The FEI number is a unique identifier assigned by the FDA to identify firms associated with FDA regulated products.

2) Recalling Firm Name: - it is a firm name which recall the drug from the market.

3) Product type: - type of the product which recall from the market.

4) Product classification: - it is classifying the product in three different class. Class I: A medical device with low to moderate risk that requires general controls. Class II: A medical device with a moderate to high risk that requires special controls. Class III: A medical device with high risk that requires premarket approval.

5) Status: - what is the status of the recall’s drugs.

6) Distribution Pattern: - where they distributed the drugs.

7) Recalling Firm City, Recalling Firm state, Recalling Firm Country: - from where they recall the drugs.

8) Center Classification Date: - the date of the recall.

9) Reason for Recall: - reason for the recall why the drugs are recalled.

10) Product Description: - it is the description of the product which indicates the transfusion of the product.

11) Event id: - Event identifiers uniquely identify a particular event. Each event source can define its own numbered events.

12) Product id: - it is identification number of the product.

13) Center: - it is the Center within FDA that regulates biological products for human use under applicable federal laws, including the Public Health Service Act and the Federal Food, Drug and Cosmetic Act.

14) Recall details: - in this column give the one URL and this URL we can see the full details of the recall products.

Data itself preety clear as it does not have any missing values and we can easily see from the heat map that thare are not too many co-relation between the attributes.

The FDA is responsible for caring the community health by ensuring the safety, efficacy, and safety of human and veterinary drugs, biological products, and medical devices. The dataset provides information on product recalls initiated by companies and manufacturers that are regulated by the FDA. The dataset includes information on a variety of products, including drugs, biological products, medical devices, and food. So the data is much more sufficient to compare with different product with number of recall times. We can easily answer the research question from the dataset.

Although all of the datasets used for this analytic project work were obtained from open data sources, we took the time to review the various terms and conditions of use for all of the data sources to ensure that all of the ethical guidelines, which include Consent, Clarity, Consistency, Control, and Consequences, are followed to the letter.

Consent: The FDA recalls dataset includes information on products and companies that are regulated by the agency. This information is sensitive and confidential, so it's more important to consider the privacy and confidentiality of the individuals and companies’ involvement in the recall process.

Clarity: The dataset contains well-structured and well-organized data, as well as detailed explanations of the recall data provided to analyze and predict the future trend by analyzing the data.

Consistency: The data is consistent in terms of the information that is included for each recall. We’ll ensure that users can easily compare different recalls, regardless of the type of product or the company responsible for the recall.

Control: Responsible for maintaining the accuracy and completeness of the data. The FDA may also take steps to protect the privacy and security of the data, such as redacting sensitive information or limiting access to the data to authorized users.

Consequences: The data is used to protect public health and to provide important information to consumers, health care professionals, and others. However, it's also important to consider the potential consequences of using the data, such as the risk of harm to consumers or the potential for misuse of the information.

Data Science Ethics Checklist

Deon badge

Exploratory Data Analysis

https://asq.org/quality-resources/recalls

https://www.politicalsciencenotes.com/articles/system-of-recall-advantages-and-disadvantages-of-recall-system/326

https://www.canada.ca/en/services/health/food-recalls-alerts.html

https://www.eatthis.com/major-food-recalls-february-2023/

https://www.fda.gov/safety/recalls-market-withdrawals-safety-alerts

https://hayandknight.com/blog/2020/07/why-are-product-recalls-important/

https://datadashboard.fda.gov/ora/cd/recalls.htm

https://github.com/raj-bhalodwala/Capstone-group-5